A Statistical Mechanical Approach to Protein Aggregation 
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We develop a theory of aggregation using statistical mechanical methods. An example of a com- 
plicated aggregation system with several levels of structures is peptide/protein self-assembly. The 
problem of protein aggregation is important for the understanding and treatment of neurodegener- 
ative diseases and also for the development of bio-macromolecules as new materials. We write the 
effective Hamiltonian in terms of interaction energies between protein monomers, protein and sol- 
vent, as well as between protein filaments. The grand partition function can be expressed in terms 
of a Zimm-Bragg-like transfer matrix, which is calculated exactly and all thermodynamic properties 
can be obtained. We start with a two- state treatment that can be easily generalized to three or 
more states using a Potts model, for which the exactly solvable feature of the model remains. We 
focus on nx N ladder systems, corresponding to the ordered structures observed in some real fibrils. 
We have obtained results on nucleation processes and phase diagrams, in which a protein property 
such as the aggregate concentration is expressed as a function of the initial protein concentration 
and inter-protein or interfacial interaction energies. We have applied our methods to A/3 (1-40) and 
Curli fibrils and obtained results in good agreement with experiments. 
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I. INTRODUCTION 

In addition to folding into unique native structures 
of globular proteins, a general property of a protein is 
its self-assembly into aggregates and fibrils under certain 
conditions^. Unlike the reversible native structure, for- 
mation of solid fibrils could be irreversible and results in 
an overall stable state of a protein. Aggregates, fibrils, 
and plaques are often associated with human neurode- 
generative diseases, such as Alzheimer's disease, Parkin- 
son's disease, diabetes, and prion-related diseases, to 
name just a few. For this purpose, it is important to 
understand the mechanisms and pathways of the associ- 
ated aggregation processes. 

Due to the high degrees of freedom involved, protein 
aggregation processes are a difficult problem to study us- 
ing all-atom molecular dynamics methods. Simplifying 
approximations^ or coarse-graining models are often in- 
troduced. Another alternative is to use statistical me- 
chanics and thermodynamics approaches, due to their 
ability to reduce greatly the number of degrees of free- 
doms or parameters involved. Several such statistical me- 
chanical approaches have recently appeared in the litera- 
ture^—. For example, van Gestel, et al. have developed 
a simple two-state model for studying helix-coil or sheet- 
coil transitions in aggregates along with a polymerization 
transition^riS. Schmidt, et aL^^ and othersii^iii^ focus a 
well-defined pathway of aggregation including monomer, 
oligomer, and fibril structures. Zamparo et al general- 
ized the WSME modeiiiii^ for the studies of protein ag- 
gregation that includes helix-sheet transitions. Earlier, 
Skolnick et al.^^ and other a^^i^-*^ used the Zimm-Bragg 
model for protein folding22ri2i to study tertiary interac- 
tions between neighboring helical proteins. 

The Zimm-Bragg^— or Ising-like model^ have 
been extended and applied to the study of protein ag- 



gregation problems, starting from effective Hamiltonians 
or partition functions. In one example, van Gestel, et al. 
assumed a bond linking two proteins can assume coil and 
helix states^^^ or coil and sheet statesSiii^. In Zamparo's 
models, a protein can take helix or sheet conformationsi^. 
In reality, a protein can take all three (or more) confor- 
mation a^^i^^ , resulting in richer pathways and properties. 
Thus, it may be advantageous to introduce a three-state, 
or more generally, a g-state model, where g = 2, 3, 4,. . . , 
an integer. This can easily be accomplished by using 
a g-state Potts model^^. Another power of the Zimm- 
Bragg type of approaches is the use of transfer matrices, 
providing the possibility of obtaining exact or analytic 
solutions. The exactly solvable feature can be kept in a 
Potts model. 



The purpose of the present article is to develop a sta- 
tistical mechanical theory on protein aggregation based 
on an effective Hamiltonian, a Potts model, partition 
functions, and transfer matrices. From this theory, we 
can obtain thermodynamic and nucleation properties as- 
sociated with the self-assembly process of proteins. In 
the next section we describe the system, the aggregation 
pathways that we investigate, and the aggregate and so- 
lution phases. We also describe effective Hamiltonians for 
a single aggregate and the statistical mechanical methods 
that are used. In Section 3 we explicitly include solvent 
(water) interactions and define an effective Hamiltonian 
for the formation of critical nuclei. We then calculate a 
few experimentally relevant thermodynamic quantities. 
In Section 4, we include inter-filament interactions to 
model full fibrils. In Section 5, our theory is applied 
to the aggregation of A/3(l-40) and Curli fibril systems, 
and results are compared to experimental observations. 
Finally in Section 6, we discuss a helix-sheet-coil aggre- 
gation model. 
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II. SYSTEMS STUDIED 

We consider the protein aggregation pathway from 
monomers, to dimer, trimers,. . . , ohgomers,. . . , fila- 
ments, proto-fibrils and fibrils. In general not all 
oligomers or aggregates are stable, but monomers, fibrils, 
and sometimes oligomers are observable on experimental 
time-scales. Here, we assume that all these species are 
in kinetic equilibrium and are interested in thermody- 
namic properties of aggregates. We assume the monomer 
is an unstructured protein, but in reality it can be col- 
lapsed coil^^, which could be taken into consideration in 
a more detailed model. A filament is a linear chain of 
interacting, identical proteins that we fix to it a ID (or 
quasi- ID) lattice. This is reasonable because aggregates 
from oligomers to proto-fibrils are still soluble and float- 
ing around in solution. Initially, we focus on one of them 
at a time. The coordinate is along the sequence of the 
chain and does not necessarily imply the chain is geomet- 
rically straight. We consider the chain as a sub-system in 
a volume solution. Several filaments or proto-fibrils are 
known to assemble into fibrils, where participating fila- 
ments and proto-fibrils are held together by stabilization 
interactions. In our studies, these structures are put onto 
strip lattices (2 x TV, 3 x TV, . . . , n x TV) by which we can 
model lateral interactions between between n filaments 
that comprise a single fibril. 

The aggregate phase is the strip (or ID) lattice that 
may be occupied by aggregates and any other species, 
including solvent clusters. This phase is in equilibrium 
with dissolved proteins in the solvent phase. The chem- 
ical potential for protein monomers in the solution can 
be written2£^^ as 

MsoZn = + IJ^SR ^ RT \n C (1) 

where the subscript 'S' stands for solution, /j^st and /j^sr 
are the free energy contributions arising from the trans- 
lational and rotational motional freedom that monomers 
possess in solution, respectively, and c is the concentra- 
tion of monomers in solution. For the chemical potential 
of the aggregates, /ia^^, we assume a crystalline approx- 
imation so that /j^agg cau be written as^^ 

l^agg = I^PC + I^PV (2) 

where 'P' stands for polymer of proteins, jupc and /ipy 
are the free energy contributions arising from the contact 
and interface interactions between proteins in aggregates, 
and the vibrational motional freedom that proteins in 
aggregates possess, respectively. Equilibrium between a 
solution phase and an aggregate phase of protein is then 
given by 

l^agg ~ f^soln- (3) 

With the simple statistical mechanical model presented 
in the sections to follow, we can relate the chemical po- 
tential contribution from the interactions between pro- 
teins in aggregates, jupc^ to the experimental concentra- 
tion of protein in solution via Eq. (|3|). We present several 



different versions of effective Hamiltonians for describing 
the interactions between proteins in aggregates below. 

Perhaps the most salient feature of amyloid fibrils is 
the cross-beta structure^ —, but conformations such as 
helix and coil may play roles in the early stages of fib- 
rillization. Our aggregation model does not start with 
residue-residue interactions, but with individual pro- 
tein molecules, which are classified into coil, helix, and 
sheet proteins, as defined below. In a Zimm-Bragg-like 
model, order parameters, 6>, for the protein are defined as 
the fractions of that secondary structure in a protein^^. 
When the protein is completely unfolded/folded, ^=0,1, 
respectively. In our model, a 'sheet' protein is one which 
is dominated by sheet or hairpin structures where on av- 
erage sheet > Ohelix and sheet > Ocoih which mcaus that 
the majority of the residues are involved in the formation 
of sheet structure. A 'helix' protein is defined similarly; 
on average Oheiix > sheet and Oheiix > Ocoii and the pro- 
tein is majority helical. The random coil is short of sec- 
ondary structures. To reduce the number of parameters 
needed to describe protein aggregation, we don't specify 
conformations other than helix, sheet, or coil. Generally, 
any number of stable conformations could be included in 
a model description, and thus instead of using an Ising- 
like model, we express our Hamiltonian in terms of a 
Potts model with q states with g = 1, 2, 3, . . . 

A simple effective Hamiltonian for the interactions be- 
tween TV proteins that compose a single filament on a ID 
lattice, where the protein could be in a helical, sheet, or 
coil conformation, can be written in terms of a three-state 
Potts model as 

N-l N-1 

-pUfu = Pi^(^(t„l) + P2^(^(t„2) (4) 

N-l 

- Rit,,U^i) [1 - S{U,U^i)] + (TV - 1)K 

where /3 = l/kpT and S{x^y) is the Kronecker delta, 
which equals one if x = y and zero otherwise. Eq. (jl]) 
is a g-state Potts-type model, where the generalized spin 
variables can take values t = 0, 1, . . . , For aggrega- 
tion, the spin states correspond to protein conformations, 
where t = 0, 1, 2 indicates that a protein is a random coil, 
a sheet, or a helical conformation, respectively. The first 
and second terms are non-zero only when the ith protein 
is in a sheet or helix conformation, respectively. The free 
energies described by Pq, Pi, or P2, refer to the interac- 
tion between the ith protein that is coil, sheet, or helical, 
respectively, and the nearest neighbor protein at location 
i -\- 1. Hence the summation in the first two terms runs 
to TV — 1 instead of TV. Even though we think of Pq, Pi, 
or P2 as an interaction energy between two neighboring 
proteins, the energetic weights of these energies are as- 
sociated with the ith protein, and indeed depend on the 
conformation of the ith protein. We set the coil interac- 
tion energy, Pq to zero which serves as a reference for the 
helix and sheet interactions. Thus, if Pi < (P2 < 0), 
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the random coil interaction is more stable than the sheet 
(or helix) interaction; if Pi > (P2 > 0), the sheet (or 
helix) interaction is more stable than the random coil in- 
teraction. > is an association energy between two 
monomers that does not depend on conformation. Since 
K simply links two nearest neighboring monomers, the 
number of K interactions may be thought of as the de- 
gree of polymerization of aggregates. In this way, a dimer 
composed of two coil monomers will have energy equal to 
whereas otherwise dimer s would be indistinguishable 
from monomers. 

The third term in Eq. (H]) is a free-energetic penalty 
associated with the interface between different regions 
of structure. These interface penalties are parameter- 
ized by energies Rj > 0, where j = 0, 1, or 2 refers 
helix-coil, sheet-coil, and helix-sheet boundaries, respec- 
tively. The notation R{ti^ ^i+i) refers to the energy of the 
specific type of boundary: helix-coil or coil-helix bound- 
aries, i?(0, 2) = i?(2,0) = Rq] sheet-coil or coil-sheet 
boundaries, i?(0, 1) = i?(0, 1) = Ri] and sheet-helix or 
helix-sheet boundaries, i^(2, 1) = i^(l,2) = R2. Note 
that the index j in Rj does not correspond to q of the 
Potts model. As to the physical origins of the i?-terms, 
they can arise from the effective repulsive interactions 
between neighboring proteins of different conformations, 
but more likely, they arise from the loss of entropy at the 
boundaries between regions of different conformations. 
R can then be thought of as an initialization parameter, 
or a barrier to over-come. Overall, six total parameters, 
which are summarized in Fig. [U^a), are needed to de- 
scribe possible interactions between proteins. However, 
in practice it could be less because not all conformations 
may play significant roles in aggregation. For instance, 
it is well known that many fibrils are dominated by cross 
beta-structure, therefore a 2-state, sheet-coil model is a 
justified system of importance. 

In general, with a simpler two-state system we can 
model sheet-coil, helix-coil, or even helix-sheet systems 
using a g = 2 Potts-type interactions, which can be 
reduced into an Ising-type model. As an example, let 
ti = —1, +1 refer to whether the ith protein is a random 
coil or sheet conformation, respectively. The effective 
Hamiltonian for a Potts model for sheet-coil filaments is 

A^-l N-1 

2=1 2=1 

(5) 

where the coil is taken as the reference state. As with the 
Potts models2i, the term Pi corresponds to a "magnetic- 
field" strength, and Ri the spin-spin interaction and the 
Boltzmann weights ai = exp(— 2i?i) and Si = exp(Pi) 
are the Zimm-Bragg-like "initiation" and "propagation" 
parameters for sheet-coil protein aggregation^. By sub- 
stituting the identity 6{ti^tj) = ^ (1 -\-titj) into Eq. (j5j) 
and simplifying, we get the Ising-type aggregation model 
of van Gestel et al., Eq. (2) in Ref. |9|. The only difference 
in our approach is that we assume a spin variable t refers 
to a protein conformation whereas in Ref. t refers to 
the state of a bond between proteins. 
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FIG. 1: (Color online) Summary of solvent and protein con- 
formation energies. A site is occupied with a solvent, n = 
(square), or a protein, n — 1 (circles). Only proteins may 
assume a particular conformation (sheet, black/solid circle; 
helix, red circle marked with X; coil, white circle). In (a), a 
^ = 3 Potts model for helix-sheet-coil conformations is shown 
where only protein-protein interactions are illustrated. In (b), 
a dilute q — 2 Potts model for sheet-coil conformations is 
shown where both protein-protein and protein-solvent inter- 
actions are indicated, and ric = 1. In both (a) and (b), in- 
teractions that stabilize the aggregate are shown with down 
arrows, whereas interfacial interactions between different re- 
gions of structure are drawn with up arrows. 



III. EXPLICIT INCLUSION OF THE 
INTERACTIONS WITH SOLVENT 

A. Protein/Solvent interfaces 

Before calculating thermodynamical quantities of ag- 
gregates, we consider the effects of solvent on the forma- 
tion and propagation of protein aggregates. It is gener- 
ally believed that protein aggregation is a nucleation pro- 
cess^^, where the free energy of a small assembly increases 
until a nucleus with Uc monomers is formed. Creation of 
nuclei is a slow, stochastic process. Once a nucleus is 
formed, it may elongate at either end by monomer addi- 
tion rapidly with free energy going downhill, eventually 
forming filaments^. Other more complex pathways are 
also possible, including the merging of aggregates. Ki- 
netic models are often used to measure the rates of this 
nucleation/elongation proces a-^-^i^^ — . In particular, re- 
cent studies by Zhang and Muthukumar— and other^i^ 
have indicated that nuclei formation occurs only for two, 
three, . . . , n-layer aggregates (we call them quasi- ID ag- 
gregates), and no nucleation barrier exists in ID systems. 
In this section we assume that a nucleus term added to 
a ID effective Hamiltonian is a coarse graining of a more 
realistic quasi- ID model for nuclei, where the lengths of 
the aggregates are much greater than their widths. This 
is mainly a simplification, or it can be considered as an 
approximation to the case where an oligomer is the fun- 
damental unit (or particle) based on which a proto-fibril 
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is formed. On the other hand, as mature amyloid fibrils 
are known to contain many thousands of proteins and 
are non-branching structures, they grow primarily in one 
dimension. More accurately, in Section HV] we will con- 
sider the nucleation of a quasi- ID model (or an n x 
model, where n=l, 2,. . . , and is much smaller than A^). 
Comparison with the simpler ID model in Fig. [6] shows 
that a ID statistical mechanical model captures some of 
the features of the n x TV or quasi- ID models. 

In our model, the nucleus is a stretch of Uc number 
of sites on a ID lattice that are occupied by proteins, 
where aggregates are flanked by solvent on both sides, 
and the proteins are linked via the interaction dis- 
cussed above. The solvent could be, for example, a clus- 
ter of water molecules. We assume that the nucleus sur- 
rounded by solvent define an interface that is described 
by the free-energy A > 0. This interface energy may be 
attributed to surface tension between solvent and a nu- 
cleus, where proteins in the nucleus may be in contact 
and are involved in long-range interactions. At each site 
on the lattice, the occupation variables rii = 0, 1 indi- 
cate whether the site is occupied by solvent or protein, 
respectively. The ID lattice, on the other hand, can be 
considered to be embedded in a large overall space, either 
2D or 3D, which is filled with solvent and a dilute protein 
solution. That is, we assume the distance between any 
two aggregates is large enough that we can focus on a 
single one at a time. In some way our approach is similar 
to other spin-models for aggregation that put solvent and 
protein on the lattice^^^^^. 

To include the nucleus-solvent interfacial free-energy, 
we modify Eq. (|5j). To ease notation, define xi^^v) = 
1 — S{x, y) where x, y can be either spin or occupation 
variables located at sites i and j. x is zero \i x = y 
and 1 otherwise. The lattice-gas effective Hamiltonian 
for interactions between sheet or coil proteins as well as 
nuclei-solvent interfaces on a ID lattice with Nt sites is 
now given by 

-Pnpp = ^ {Pi5{U,l) + K-Rix{U,U+i)}nini+i 
i=i 

- ^ Rix{ni,ni+i)[6(ti,l)ni -\- 6(ti+i,l)ni+({^) 

i=l 

Nj^ — ric — l i+ric — l 

where 'pp' in —PHpp refers to 'protein-protein' interac- 
tions and 'ps' in —f^Hpg refers to 'protein-solvent' inter- 
actions. Eq. dHj) is the effective Hamiltonian associated 
with a nuclei-solvent interface, with xi^i^ ^*+nc) ensuring 
that there is solvent at site i and a protein at i + nc, or 
vice- versa. The product of Kronecker terms fixes all the 
remaining sites between the solvent at i and the protein 
dit i -\- Tie to be occupied by proteins. In Eq. (j6j), terms 
with Pi , and Ri have the same meaning as in Eq. ([5]) 
and make up the effective Hamiltonian for sheet-coil fil- 
aments in the lattice-gas Potts model. Now K explicitly 



depends on whether two neighboring sites are occupied 
by proteins and facilitates the elongation of an aggregate. 
A is the nucleus-solvent interfacial free-energy. To avoid 
introducing any more free parameters, we assume in sec- 
ond summation in Eq. ([7]) that the interaction between a 
sheet protein immediately flanked by solvent is described 
by the interaction free-energy Ri. With this convention, 
both ends of a sheet segment contribute a factor of 
regardless of whether the segment is flanked by proteins 
or solvent. The free energies for sheet-coil aggregates 
including solvent is summarized in Fig. [TJb). 

Eq. is a more general approach to fibril elongation 
when compared to previous statistical mechanical mod- 
els for protein aggregation^^^^Eii^ji^, which focus on spe- 
cific aggregation pathways. Fibrils may grow longer via 
monomer addition at fibril ends, which in a sense is sim- 
ilar to some kinetic models for elongation, in particular, 
the model proposed by Massi and Straub^. Additionally, 
by using the lattice-gas formalism, Eq. (j6j) can accommo- 
date a variety of elongation mechanisms including merg- 
ing and fracturing of aggregates of different sizes along 
the ID lattice. In reality the merging of filaments and 
proto-fibrils is not a ID process, and in Section |lVl we 
will consider a related effective Hamiltonian on a strip 
lattice to model interactions between ID filaments. In 
this section, we focus on a ID model for aggregate elon- 
gation. 



B. Average properties and thermodynamics 

Now that we have discussed interactions between 
protein-protein and protein-solvent, we can calculate 
thermodynamic quantities and test model predictions 
against experimental data. First, we must calculate the 
partition function for Eq. (|6]). Since the number of pro- 
teins on the ID lattice may fluctuate, we work within 
the grand canonical ensemble where Nt refers to the to- 
tal number of lattice sites, and A^p=Xl^^i refers to the 
total number lattice sites occupied by proteins. Q is a 
grand partition function. Substituting Eqs. ([7|) and ([8]) 
into Eq. ([6]), we write Q for the lattice-gas filament model 
as 



Q= ^^v{-mfu^P^PcNp) (9) 

{t},{n} 



where pfi pc is the dimensionless chemical potential aris- 
ing from the contact and interface interactions between 
proteins in aggregates, and the notation {t},{^} means 
summation over the spin, occupancy variables, respec- 
tively, at each site. For Nt > 2nc, Q may be solved for 
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exactly by a transfer matrix T as, 

S = ^ W T(t2, ti+i,ni,ni+i, . . . ,ni+nc) (10) 

{t},{n} i=l 

r i+nc-l ^ 

X exp<^ Ax(ni,ni+nJ \{ ^K-,l)> (H) 
X exp{i?ix(ni,ni+i) [^(t^, l)ni + ^(t^+l, l)ni+i] + /Sfipcrii} 



where we sum over conformations only if the ith site is 
occupied by a protein, i.e., = 1. Notice that the pa- 
rameter for sheet propagation is counted only when the 
ith site is a sheet protein. As an explicit example in writ- 
ing out the transfer matrix, we consider the case Uc = 1 
for a two-state system. Using Eq. (pTj) gives the elements 
of the following matrix 



r 







-1 


1 
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1 


U ni\ 











1 


^/a 


yJOL(J\ 


-1 1 




kz 


kzyJo\ 


1 1 


z^/aoi 


kzs\^J~d\ 


kzs\ 



(12) 



where s\ and g\ were defined above, k = exp(i^) and 
a = exp(— 2^4) are the Zimm-Bragg-like parameters, and 
z = ex.p{f3fipc)' The matrix elements Tij represent the 
probability of each type of interaction. For general q and 
nc, the transfer matrix has dimension (g + l)^^ x (g + l)^^ 
and A^A = {q -\- l)^"" number of eigenvalues. 

Now we can write Q and calculate thermodynamic 
properties using the eigenvalues of the transfer matrix. 
For a finite lattice boundary conditions must be speci- 
fied. These could be open, where either ends of the lat- 
tice could be occupied by a protein of a specified confor- 
mation or solvent, or periodic, where the lattice simply 
forms a ring. For any case, we have 



Nt- 



(13) 



where the coefficients Xi are determined by the specified 
boundary conditions. If periodic boundary conditions 
are imposed, we set tAr^_n,+i = tNT-n,+2 = ^2, • • • , 
^Nt ~ ^ric SO that all coefficients Xi are unity and the 
partition function is found easily from 

Q = Tr{T^^) -- 

where Tr is the trace operation, Ai is the largest eigen- 
value of the transfer matrix, A2 is the second largest 
eigenvalue of the transfer matrix, and so on. Eq. (fT5]) 
is valid when the lattice grows large and in the thermo- 
dynamic limit Nt 00, 





FIG. 2: (Color online) Plot (a) illustrates the effect of varying 
the contact chemical potential, /^/ipc, on protein number, 
(Np), with nc=2 (solid, black), 4 (dashed, blue), 6 (dotted, 
red). In (b), (Np) vs. 13/jpc is shown for K= OksT (solid, 
black), IksT (dashed, blue) and 2.5kBT (dotted, red). Unless 
otherwise stated, K = P = IksT, R = A = IksT, Nt 
1000, and ric = 2. 



Finally, we calculate some properties of the system. Of 
particular interest are the average number of proteins on 
the lattice, (A/p), which we refer to as the occupation of 
the lattice, the number of proteins in filaments, {tjj), the 
number of filaments, (7), the number of sheet proteins in 
filaments, (^), and the number of sheet segments, (i^), as 



(7) 
(V') 

{0) 



d 
oz 



(7) 



1 d 



2dRi 
d 



InQ 



dPi 



InQ 



(17) 
(18) 
(19) 
(20) 
(21) 



respectively. In each expression all energies except the 
varying one are held constant upon differentiation. A fac- 
tor of 1/2 in Eqs. (fT8|) and ([2Q|) corrects the over-counting 
of the number of distinct fflaments and extended sheet 
regions. We also calculate the average length of aggre- 
gates, {Lp)^ and the average length of sheet segments, 
according to 



(Ls) 



" (7) + 
_ {0} 



1 



H + 1 



(22) 
(23) 



A^-ilnQ = lnAi. 



(16) 



respectively. The factor of 1 in Eqs. ([22]) and (|23|) ac- 
counts for the case where proteins completely occupy the 
lattice. 



C. Numerical Results 

In this section we compute the thermodynamic quan- 
tities represented by Eqs. (|17ti23p for varying system pa- 
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200 400 600 800 1000 

FIG. 3: (Color online) Plot (a) illustrates the effect of varying 
ric and protein number, {Np), on (7), i.e., Eq. fT8|) . with nc=l 
(solid, black), 2 (dashed, blue), 3 (dotted, red), 4 (dashed- 
dotted, green), 5 (purple triangles), 6 (cyan circles). In (b) 
(7) vs. concentration is show for K= OksT (dashed, black), 
IksT (dotted, blue) and 2.5A;bT (dotted-dashed, red). Plot 
(c) illustrates the effect of varying ric and (Np) on (zy), i.e., 
Eq. (|20|) . with nc=2 (dashed, blue), 4 (dashed-dotted, green), 
6 (cyan circles). In (d) (u) vs. (Np) is shown for P= OksT 
(dashed, black), IksT (dotted, blue) and 2/cbT (dotted- 
dashed, red). In (e) and (f), (ip) and (Lp) are plotted against 
{Np), respectively, where in both plots K= OksT (dashed, 
black), IksT (dotted, blue) and 2.5kBT (dotted-dashed, red). 
Finally, In (g) and (h), (0) and (Ls) are plotted against (Np), 
respectively, where in both plots Pi= OksT (dashed, black), 
IksT (dotted, blue) and 2/csT (dotted-dashed, red). Un- 
less otherwise stated, K = P = 1/csT, R = A = IksT, 
Nt 1000, and Uc 2. 



rameters and different ric. For later use, we define the 
normalized number of proteins on the lattice (referred to 
as the coverage) as 



(Np) 
Nt ' 



(24) 



(j) can also be thought of as the concentration of proteins 
in the aggregate phase. 

In Fig. [21 we plot the average number of proteins on 
the ID lattice, (A/p), versus the chemical potential contri- 
bution from the contacts, /ipc- The values of /ipy, /isr, 



and /JjSr in Eqs. ([T]) and (j2j) are regarded as constants 
at a specified temperature. Thus varying /ipc, through 
Eq. (jSj , can be accomplished by changing the experimen- 
tal concentration, c, of protein in solution. Both Fig. (|2]) 
(a) and (b) illustrate the dependence of (Np) versus /ipc, 
where for large, negative values of /ipc, almost no pro- 
teins are found on the lattice and in aggregates. In other 
words, at low protein solution, c, aggregates are found 
in extremely few numbers. As protein concentration, c, 
increases (i.e. /ipc increases), proteins may form ag- 
gregates in greater numbers, and at an increasing rate 
as the lattice becomes nearly half saturated. Further in- 
creasing the protein concentration in solution allows more 
monomers to join aggregates rather easily until the lat- 
tice becomes saturated. In Fig. [2] (a), the effect of varying 
the critical concentration on average number of proteins, 
(Np)^ is illustrated, where increasing ric is seen to have 
only a marginal effect on the (Np) dependence on jupc- 
Whereas in Fig. [2] (b), varying system parameters that 
parametrize the contact strengths clearly influences the 
average number of proteins on the lattice at particular 
experimental concentrations. For instance, as illustrated 
in Fig. [2] (b), increasing the strength of interactions be- 
tween proteins, causes proteins to join aggregates at 
lower concentrations of monomers in solution. 

In Figs.[3ja-h) we considered a 2-state sheet-coil model 
on a finite lattice with Nt = 1000 total sites and pe- 
riodic boundary conditions imposed. In Fig. [3l^a), we 
show effects of varying ric- As protein occupation, (Np), 
increases, the number of filaments increases to a max- 
imum value, then, the filament numbers decrease with 
(Np) as the lattice becomes saturated with proteins. In- 
creasing ric from 1 to 6 progressively increases the value 
of (Np) for both the onset of filament nucleation and the 
maximum number of filaments, respectively, and also de- 
creases filament numbers overall for all values of (Np). As 
shown in Fig. [3l^b), increasing the association energy be- 
tween monomers, from 0/cpT shifts the value of (Np) 
where (7), the number of filaments, reaches a maximum 
to lower values. Additionally, increasing K causes (7) 
to rise faster at low protein average number of proteins, 
while also progressively reducing the overall number of 
filaments at values of (Np) away from zero. 

In Fig. [3l^c) and (d), we plot the number of sheet seg- 
ments, (z^), versus the protein occupation, (A^p), for var- 
ious ric (Fig. [3Kc)) and Pi (Fig. [S^d)). The number of 
sheet segments, (z/), increases with (Np) until reaching 
a maximum, then decreases toward a common value at 
maximum protein occupation {Np)=NT. In Fig. [3](c), 
increasing ric increases the maximum number sheet seg- 
ments since larger nuclei may contain more sheet-coil in- 
terfaces than smaller nuclei. Also, the maximum of (u) 
occurs at progressively lower protein occupation as ric in- 
creases. Fig. ^d) shows that increasing the interaction 
strength between sheet proteins. Pi, reduces the total 
number of sheet segments for all but the lowest values of 
protein occupation, while increasing the average length of 
the sheet segments (see Fig. [31(h)). The maximum value 
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FIG. 4: (Color online) The phase plots of the number of 
sheet segments, {v) (solid, black lines), and the number of 
proteins in sheet segments, (^), versus system parameters (a) 
cTi = exp(— 2i?i) and (b) si = exp(Pi), all versus coverage, 
(j). In both plots, normalized {0) at a particular (ai, 0) may 
vary from zero to one, with white color indicating {0) — 
and solid red indicating {0) = 1. The white-to-red gradation 
represents values in the range 0-to-l. Dashed (blue) lines indi- 
cate where filaments contain equal parts sheet and coil, which 
we define to be the locations of gradual, conformational phase 
transitions. Regions to the left of dashed lines indicate fila- 
ments are mostly composed of coils, whereas regions to the 
right of the dashed line indicate the filaments have majority 
sheet structure. Also, regions to the left of the dotted (dark 
blue) line indicate more solvent than proteins on the lattice, 
whereas regions to the right of the dotted line indicate more 
proteins than solvent on the lattice. We refer to the dotted 
line as the locations of solvent /protein equal-population. Un- 
less otherwise indicated in the plots. Pi = IksT, K = IksT, 
IksT, A IksT and ric 2. 



for {v) is also achieved at lower protein occupations for 
increasing Pi. 

In Fig. [3l^e), we plot the number of proteins in fila- 
ments, versus protein occupation, {Np). In Fig.[3ff), 
we plot the average length of aggregates, (i^p), versus 
protein occupation. In both figures K is varied as well. 
As protein occupation of the lattice increases, proteins 
start to join filaments, and increases almost linearly 
with {Np). The lengths of filaments also increase as pro- 
teins join filaments, but not linearly. Once the lattice be- 
comes occupied mostly by proteins, the lengths take off 
and reach a maximum value at high protein occupation. 
Thus, increasing K increases the numbers of proteins in 
filaments and the lengths of the filaments. 

Finally, we plot the number of sheet proteins in fila- 
ments, (6^), and the length of sheet segments, (i^s), versus 
protein occupation, (A^p), in Fig. Wig) and (h), respec- 
tively, for different Pi values. The behaviors of {0) and 
{Lg) are similar to the behaviors of {ip) and (i^p), while 
increasing Pi clearly increases the number of sheet pro- 
teins and the sheet segment lengths. Varying Uc only 
marginally changes {Lp)^ (6>), and (Lg) (not shown). 

In addition to quantities plotted in Fig. [3l we present 
phase diagrams in which thermodynamic properties of 
aggregates are plotted as functions of interaction param- 
eters. These plots yield information on when sheet and 
coil proteins are in equal numbers, locations we define as 
sheet-coil phase transitions of filaments. In Fig.HJ^a), the 



number of sheet segments, {v)^ and the number of sheet 
proteins in aggregates, (^), vs. ai and (j) are computed, 
respectively, and in Fig. HJ^b), {v) and {0) vs. si and (j) 
are shown. 

In Fig. [4](a) the maximum number of sheet segments 
occurs at high protein coverage and weak sheet-coil in- 
terface interactions, that is ai ~ 0.05. From this region 
of the phase plot, {v) decreases in every direction, which 
means sheet segments decrease in numbers for smaller 
protein coverage, and also when the interaction energy 
of a sheet-coil interface increases, i.e., ai << 0.05. The 
number of sheet proteins in filaments, {6) is maximal at 
high protein coverage, and decreases in magnitude even- 
tually tending toward {0) ^ as the protein coverage de- 
creases. However, at high protein coverage, the lengths of 
sheet segments (not shown) are longest when the sheet- 
coil interface interaction is large, ai ^ 0, and shortest 
when the interaction is small, cfi ~ 0.05. Additionally, 
the curve representing equal numbers of solvent and pro- 
tein on the lattice (referred to as the 'solvent/protein' 
curve) is not strongly dependent on the value of ai . How- 
ever, coil-sheet transition locations tend toward higher 
protein coverage as the sheet-coil interface energy weak- 
ens and eventually (Ji ^ 0.05. 

In Fig.m^b), the number of sheet segments, (z^), is max- 
imal at high protein coverage and also when 5i ^ 1 where 
the interactions between sheet proteins are weak or zero. 
{6) is maximal at high protein coverage and large interac- 
tions between sheets, i.e., large si, and decreases in every 
direction from this region. The solvent /protein curve lo- 
cation occurs at essentially a fixed protein coverage for 
5i > 1, but for si < 1, the curve tends slightly toward 
higher protein coverage. On the other hand, for large 
si, the coil-sheet transition occurs at roughly the same 
protein coverage (about (j) = 1/2), but once si decreases 
towards si = 1, the protein coverage where coil-sheet 
transition occur increases, tending toward si ~ 1 at very 
high protein coverage. Thus, once si < 1, interactions 
between sheet proteins are repulsive and the proteins in 
filaments are largely in coil conformations. However, this 
region may be unphysical as large aggregates of proteins 
are known to contain /3-structure. 



IV. QUASI-ID MODELS FOR PROTOFIBRILS 
AND FIBRILS 

Protein protofibrils and fibrils comprise of several fil- 
aments. To study thermodynamic properties of fibrils 
or proto-fibrils, we add the interaction energy terms be- 
tween Ly number of filaments in the effective Hamilto- 
nian and put the fibrils onto di LyX N strip lattice, which 
is a finite strip in one direction of an TV x square lattice. 
The 2 X N strip is illustrated in Fig. [5fa). In Fig. [5fb) 
and (c) the representation of a proto-filament of A/3(l- 
40) is shown, originally produced Tycho and coworkers^^. 
We will model this proto-filament as two ID filament-like 
structures that propagate in the x-direction, as indicated 
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(a) (-,n?) (tlnl) (tlnj) {t%,n%) 




FIG. 5: (Color online) (a) Graphical illustration of Eq. ([26]) 
on a 2 X N strip lattice. A black dot indicates that a ver- 
tex is occupied by a sheet protein, a white square indicates 
solvent. Solid red lines indicate interactions between proteins 
along the x-axis, while dotted black lines are interactions be- 
tween two sheet proteins on the y-axis. Dashed-dotted green 
lines indicate a boundary for a nucleus, which is the dimer 
{ric = 2) positioned on the y-axis. Dashed blue lines indi- 
cate no interaction between connected vertices. Double soUd 
lines are sheet-solvent interfaces, (b) Front- view (y-z plane) of 
an aggregate of A/3(l-40) proteins, (c) Side-view (x-y plane) 
of A/3(l-40) proteins illustrating the inter- filament interac- 



in Fig. [5fb) and (c). In our model the proto-filament 
could grow either by joining two filaments together, or 
as a quasi- ID aggregate growing from a single nuclei ar- 
ranged on the y-axis. Of course, our statistical mechan- 
ical models deal only with equilibrium properties, not 
kinetic mechanisms of fibril growth. 

The position of a protein or solvent is represented by 
a vertex within the strip, and is specified by coordinates 
(i, j), which are the positions on the x and y-axis, re- 
spectively, of the strip. The total number of vertices is 
Ntot = LyN . The strip lattice in Fig.[5ja) contains spin 
and lattice-gas variables and n;^, respectively, at each 
vertex j). The spin variables = 0, 1, . . . , represent 
different conformation states of a protein and n:^=0, 1 de- 
notes lattice gas or occupation states. For simplicity we 
assume that interactions between neighboring proteins on 
the y-axis are restricted to vertices that have the same in- 
dex i, and the proteins occupying these sites must both be 
locked in the sheet conformation. We consider the strip 
that is composed of two identical ID lattices aligned in 
register, but this does not mean the filaments have to 
be in register since we allow the number of proteins to 
fluctuate. This is a main difference of our model from 
the simpler method of counting inter-filaments and loose 
ends in the model of van Gestel^ii^. 

The inter-filament interactions between two ID fil- 



aments are treated using a model similar to the 2- 
helix chain model proposed by Skolnicki^i and other a^^i^-*^ 
which uses ZB parameters for describing the inter-residue 
interactions between two independent a-helical protein 
chains. In general, the Hamiltonian for an L^y x TV strip 
lattice that includes inter-filament interactions is written 
using the ID Hamiltonian, Eq. ([6]), by changing the spin 
and lattice-gas variables ti tj and rii — > n^, respec- 
tively, as 

Ly 

where the notation 1-Lfii{j) refers to the jth filament. For 
A/3(l-40), we take Ly = 2, illustrated in Fig.[5fb) and (c). 
F parametrizes the interaction energy between two sheet- 
linked proteins which have the same ith index. That is to 
say, residues from neighboring filaments that are close in 
real space participate in stabilizing interactions between 
filaments. In our treatment F > 0, the proto-fibrils and 
fibrils are more stable than single filaments. 

On the other hand it is known that nucleation does 
not occur in a truly ID system^^, so we consider a similar 
model for aggregates that positions the nucleus along the 
y-axis as shown in Fig. [Sfa). From this point of view the 
orientations of proteins in the nucleus are perpendicular 
to the direction of propagation (x-axis) of the fibrils, and 
the nucleus is now a multi-layer, quasi- ID structure on 
di Ly X Nt ladder. This characterization of the nucleus 
corresponds with the findings of Zhang and Muthuku- 
mar^^ that the nucleus contains at least two layers of 
/3-sheet. The nuclei will assemble into proto-fibrils that 
grow longer on the quasi- ID lattice. An effective Hamil- 
tonian for quasi- ID aggregation including the multi-layer 
nucleus term can be written 

Ly Ly—1 

Nt 

- X^iJixW.nr^) \6{tl,l)rvi+6{tl+\'i)ni+'] 

Nt — 1 Ly — ^ 

- E ^ n X(ntnl+i) (27) 

where the term —/3Hpp{j) given by Eq. (4) is, upon 
changing the spin and lattice-gas variables ti and 
Ui ^ respectively, the jth effective Hamiltonian for 
a ID filament in the x-direction, one for each layer of 
the strip lattice. In the y-direction we write analogous 
interactions, —j3Hy^ similar to that in the x-direction, 
except we introduce F to represent interactions between 
two sheet proteins. Also included in the y-direction is 
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FIG. 6: (Color online) The protein coverage is plotted 
against the number aggregates, (7), in (a) for model A and 
(b) for model B. The total number of sheet proteins in aggre- 
gates, (^), is plotted in (c) for model A and (d) for model B. 
Green circles in (a) and (c) are the results of the ID model for 
(7) , whereas in (b) and (d) green circles denote the results of 
the ID model for {0). In all cases. Pi = 0.25A:bT, K = 1/csT, 
A = IksT, Ri = IksT. In all plots, the case F = OksT are 
solid black lines, F — IksT are dashed red lines, and F — 
SksT are dashed-dotted blue lines. 



the nucleus term containing the parameter A, which has 
the same meaning of surface energy as before. 

For both cases the total number of proteins on a strip 



lattice is then TV. 



strip 



Y^f X^f ^ so that the grand 



partition function is 



Q 



A(B) ^ 
strip 



{*},{»} 



exp + P^^PcNstr^p) (28) 



where the sums over {t}, {n} are for all i and j, and A, B 
refers to the effective Hamiltonians given by Eqs. (|25]) or 
, respectively. The grand partition function is solved 

"^^^M where T^lfJ is now the trans- 



Qstiip — {^strip J 



fer matrix that relates nearest-neighbor spin variables tj , 
^i+i, ti~^^ ^ tiXi lattice-gas variables n^, n^^^^, n^^"^, 
^iXi' J^st as in Section IIIIB) in the thermodynamic 
limit Nt 00, 



{LyNT)-' In Qt,iS= In \ 



A(B) 



(29) 



where A^^^^ is the largest eigenvalue of T^^fJ . In 
general, the dimension of the transfer matrix T^^-^ is 

and has {q + l^^^^y num- 

B 



{q + 1)^-^^ X + 1)^ 



ber of eigenvalues, whereas the transfer matrix T^^ip is 
+ 1)^^ X {q-\-l)^y and has {q-\-l)^y number of eigen- 
values. 

The normalized average number of sheet proteins for 
either case A or B is calculated by substituting Eq. ([28|) 
into Eq. ([2T]) and dividing by LyNr- Additionally, 



the normalized number of sheet interactions in the y- 
direction is given by 



d 



Ly-ldF 



InA^ 



A{B) 



(30) 



for either case A or B. Additionally, the number of aggre- 
gates on the strip, (7), is found by substituting Eq. ([29|) 
into Eq. (p!8|) and normalizing with respect to LyNr- 
now yields the total polymerization of aggregates on the 
strip lattice, but it does not yield the correct number of 
proteins in aggregates. Additionally, (u) is now the num- 
ber of sheet-coil or sheet-solvent boundaries, and does 
not yield simply the number of sheet segments. Thus, 
the lengths of aggregates and the lengths of sheet seg- 
ments are no longer well-defined for the strip models. 
These quantities could be defined with a more sophisti- 
cated description of aggregates on the strip lattice, for 
example, by introducing more parameters. For now we 
try to use a minimum number of parameters and focus on 
the number of aggregates and the number of sheet pro- 
teins in aggregates implied by Eqs. (|25]) and (|26|) , both 
of which are experimentally measurable properties. 

In Fig. [6] we compare qualitatively the results of the 
Ly = 2 strip models discussed above for F = IksT 
and SksT inter-filament interactions with those of two 
non-interacting filaments, i.e., F = OksT. We also plot 
results from the ID model for the same model param- 
eters. Fig. [6fa) and (b) shows number of aggregates, 
(7), vs. protein coverage for cases A and B, respectively, 
with Tic = 2. In Fig. [6ja), as (j) increases, the number of 
aggregates increases from zero and reaches a maximum, 
then decreases toward zero at maximum protein cover- 
age. Case A yields the results of the ID model when 
F = OksT. Overall, increasing F rapidly suppresses the 
number of aggregates. In case B, the location of the 
maximum number of aggregates occurs at higher pro- 
tein coverage when compared with the ID model when 
F = OksT. Also, increasing F seems to decrease the 
numbers of aggregates more slowly for case B when com- 
pared with case A for the same model parameters. There 
are also fewer aggregates in case B when compared to case 
A. 

The number of sheet proteins in filaments, (6>), is plot- 
ted in Fig. [6](c) and (d) for cases A and B, respectively. 
As protein coverage increases the number of sheet pro- 
teins in aggregates increases, more rapidly for increasing 
F. Both models A and B yield essentially the same re- 
sults for the number of sheet proteins in aggregates for 
non-zero cases of F. When F = OksT^ model A predict 
more sheet proteins in aggregates at low protein cover- 
age when compared to model B, while at high protein 
coverage model B contains more sheet proteins in aggre- 
gates than model A. Thus, overall increasing interchain 
interaction, F^ seems to increase the numbers of sheet 
proteins, but also seems to decrease the numbers of ag- 
gregates. This means the number of sheet proteins in ag- 
gregates increases rapidly with F, a fact consistent with 
increasing sheet content. This must mean that the size 
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of aggregates and the length of sheet segments increase 
with F. 



V. COMPARISON TO EXPERIMENT 

Of course, the most important test of a model is 
whether it can yield results in agreement with experi- 
mental observations. In this section, we compare model 
predictions with the experimental results on A/3 (1-40) in 
Ref.0 and on Curli fibrils in Ref.[50|- In their work, Terzi 
et al4^ used CD spectroscopy, titration calorimetry, and 
analytical centrifugation to analyze the self-association 
of A/3(l-40). In aqueous solutions, they showed that 
A/3(l-40) exhibited a reversible, concentration-dependent 
sheet-coil transition. Using CD spectroscopy, they ob- 
tained the fraction of sheet proteins in aggregates, taken 
at different concentrations. For our purposes, since the 
stable oligomer of A/3(l-40) could be the dimer^^, we use 
nc = 2 in Eq. (|26]) and calculate Eq. (|2T]) . We also tried 
to use the nc = 2 ID model described by Eq. (|6]), which 
did not produce an acceptable fit. The strip model does 
produce a good fit for A/3(l-40) aggregates, and is con- 
sistent with experimental results ^^i^ 

To work with experimental concentration, c, we must 
also specify the other chemical potential contributions 
in Eq. (|3]) for 74/3(1-40): jisT^ I-^sr and jipv- We then 
calculate fipc from Eq. dU using the experimental con- 
centrations, and then insert fipc into Eq. (|28|) . from 
which relevant thermodynamical properties are obtain- 
able. Additionally, in our calculations a 1 mM reference 
was used in computing the contributions to the solution 
chemical potential from the experimental concentrations. 
For A/3(l-40), we have iist + IJ^sr ^ -29 kcal/moP^^^ 
In Ref. [sO, /ipy for hemoglobin was found to be ap- 
proximately 0.75 * {iJisT + I^sr)- We use a similar re- 
sult for jipy for ^4/3(1-40), but in reality fipy could be 
larger since ^4/3(1-40) aggregates may be more flexible 
than hemoglobin aggregates. We substitute Eq. ([26]) into 
Eq. ([28]), then Eq. (jSH]) into Eqs. ([I7j) and ([21]), and nor- 
malize both quantities with respect to by LyNr- Eq. ([T7j) 
divided by Eq. ([21]), {0)/{Np), the /3-sheet fraction, is 
used as our fitting function. The results are plotted in 
Fig.[71^a). We calculate as a measure of the quality of the 

fit the quantity r]/Nd = yj^k ii^k) - Ouf /Nd, where 
{Ok) is the theoretical value at the kth concentration. 
Ok is the experimental value, and is the number of 
data points in the experiments^. The fit yields reason- 
able free energies at room temperature. Pi ^ K ^ A ^ 
kcal/mol, Ri = 0.35 kcal/mol, and F = 16.4 kcal/mol, 
and overall a good fit with 77/7V^i=0.007. 

With nc = 2, the fitted parameters of our model sug- 
gest that Ap {1-4:0) aggregates will grow easily as indi- 
cated by A ^ kcal/mol, and with F = 16.4 kcal/mol, 
the proteins in aggregates are strongly favored to be in 
the sheet state and bonded with a neighbor in the y- 
direction. With K ^ kcal/mol, the aggregates are 




FIG. 7: (Color online) In (a), the fraction of sheet proteins in 
A[3{l-40) aggregates, {0)/{Np), is fitted to the results of the 
Terzi et al. experiment^^ . In (b), the fraction of sheet proteins 
in Curli fibrils is fitted to the scaled results of the Hammer 
et al. experiment^*^. For the Terzi data, fit parameters were 
Pi^ K ^ A^O kcal/mol, Ri 0.35 kcal/mol, and F 16.4 
kcal/mol. For the Hammer data, Pi — 7.26 kcal/mol, K — 2.2 
kcal/mol, i?i ~ kcal/mol, and A — 1.2 kcal/mol. In (a) we 
used case B of the strip models with Uc — 2 and Eq. ()2ip as 
the fit function, whereas in (b) we used the ID model with 
Uc — 2 for aggregation and Eq. (|19p as the fit function. In 
both cases ^ = 2, and Eq. ([19]) is divided by {Np) for (a) the 
strip model and (b) the ID model, respectively. 



dominated by sheet structure, and very little coil struc- 
ture. A fitting value of Ri = 0.35 kcal/mol suggests that 
the proteins in aggregates must first overcome an energy 
barrier before converting from the coil state to the sheet 
state. Aggregates that form propagate in the x-direction, 
and the propagation is primarily driven by interactions 
between sheet proteins in the y-direction rather than di- 
rectly by the interactions along the x-direction as indi- 
cated by Pi ~ kcal/mol. Thus, once nuclei that are 
dominated by sheet structure form, aggregates will grow 
in the x-direction. 

Hammer, et al^^ studied fibrils called Curli. These 
non-branching, /3-rich fibrils are produced by enteric bac- 
teria, such as E. Coli, and are composed of multiple types 
of proteins. The major subunit is the CsgA protein which 
is nucleated into fibrils by another protein, CsgB. Since 
our model contains only identical proteins, we assume no 
difference between CsgA, and others, in Curli fibrils. We 
test our model on the experiment carried out by Ham- 
mer, et al., where aggregates of different concentrations 
of CsgB were detected by Thioflavin T, and TEM analy- 
sis at various concentrations revealed the ultrastructure 
of aggregates at the steady state^i^. Since the exper- 
iments used Thioflaven T, which binds to fibrils^^, we 
scale the florescence data with respect to the fluorescence 
signal of the highest concentration examined (cq = 43/iM 
in their experiments). Here we plot the relative /3-sheet 
content, not the absolute as with the Terzi data, and 
divide the number of sheet proteins in filaments, {0), 
by (^)o, which is the fluorescence signal at cq. The ID 
model for aggregation produced an acceptable fit, but the 
size of a critical nucleus for Curli fibrils is not currently 
known, so we choose Uc = 2 and substitute Eq. (|6]) into 
Eq. ([9]). Then plugging Eq. 1^ into Eqs. 1^ and (|2T]) . 
we use as our fit function {0)/{Np). The data points for 
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different concentrations of CsgB and the theoretical fit 
are plotted in Fig. [Tfb). At room temperature, we find 
for CsgB (isT + l-^SR ^ —32 kcal/mol and fipy ^ —25 
kcal/mol. The fitting parameters for the Hammer data 
were Pi = 7.26 kcal/mol, K = 2.2 kcal/mol, Ri ^ 
kcal/mol, and A = 1.2 kcal/mol, and overall a good fit 
with r]/Nd=0.008^ where is the number of Curli fibril 
data points. 

For Curli fibrils, the fitting value of A = 1.2 kcal/mol 
suggests that nuclei will form after small assemblies over- 
come an energy barrier. Since K = 2.2 kcal/mol, pro- 
teins tend to form aggregates. Additionally, Pi = 7.26 
kcal/mol provides strong attraction between sheet pro- 
teins, thus monomers in the aggregate will preferentially 
convert to the sheet state over the coil state. With 
Ri ^ kcal/mol, sheet proteins aggregate without over- 
coming an energy barrier and can covert easily from coil. 
Thus, the model predicts that the transition from CsgB 
monomers to Curli fibrils is largely determined by in- 
teractions between sheet proteins, and the fibrils largely 
contain /3-structure. 



VI. THREE-STATE POTTS MODEL FOR 
HELIX-SHEET-COIL AGGREGATES 




FIG. 8: (Color online) (a) Normalized (^i) at a particular (si, 
(/)) may vary from zero (white color) to one (solid red color). 
Additionally, contour lines specify the value of {O2) at a par- 
ticular (0, si). (b) (^1) and {O2) with the same identifications 
as in (a) except each quantity is evaluated at a particular (0, 
S2). A dotted line indicates equal populations of solvent and 
proteins in aggregates, a dashed line in both plots indicates 
sheet-coil/helix transitions, (^i)=0.5, with the remaining pro- 
teins either helix or coil, and a black contour line labeled 
0.5 in both plots indicates helix-coil/sheet transitions, with 
(^2) =0.5, and the remaining proteins either sheet or coil. A 
dashed-dotted line in both plots indicates equal fractions of 
helix and sheet. Both (^1) and {62) have been normalized 
with respect to system size Nt- In all cases, ric — 2, and 
unless otherwise stated K = Pi = IksT, P2 = IksT, 

IksT, R2 0.5/csT, R3 OksT and A IksT. 



In this section, we study protein aggregation based 
on a 3-state (e.g., helix-sheet-coil) ID lattice-gas model. 
The lattice-gas Hamiltonian for aggregates containing he- 
lix, sheet, or coil conformations is written similarly to 
Eqs. (jMH]), except we add interaction terms for helical 
proteins as given by 
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where Nt is the size of the lattice, and the notation for 
R{ti^ti-^i) was discussed in Section |TT1 After substituting 
Eq. (|32]) into Eq. (|3l]), then plugging into Eq. (|9]), we may 
define relevant thermodynamical quantities as 



(33) 
(34) 



vent, sheet-coil/or solvent, or sheet-helix interfaces, re- 
spectively. With these definitions, the number of helix 
segments is ((i^o) + (^2))/2, and the number of sheet seg- 
ments is ((z^i) + (i/2))/2. In Fig. [8] we imposed periodic 
boundary conditions and computed phase plots for these 
quantities in the thermodynamic limit. In general, the 
3-state model yields richer behaviors than the 2-state 
model, because helical proteins may also participate in 
the binding of aggregates. We plot the number of sheet 
proteins in filaments, (6>i), and the number of helical pro- 
teins in filaments, (^2) vs. (j) and si in Fig. [8](a), and vs. 
(j) and S2 in Fig. ^h). Additionally, Si = exp(P^) for 
z = 1, 2 and the protein coverage (j) is given by Eq. 



where (Oi) refers to the fraction of sheet, z = 1, or helix, 
i = 2, and j = 0,1,2 in (uj) refers to helix-coil/or sol- 



In Fig.[8]^a), the locations of equal parts helix and sheet 
proteins in filaments at medium to high protein coverage 
occurs when si ^ S2, that is the sheet and coil inter- 
action energies are roughly the same magnitude. As (j) 
decreases the helix/sheet curve occurs for si > S2 with 
si slowly increasing. The sheet-coil/helix transition loca- 
tion, {61) = 0.5, is only weakly dependent on large values 
of si, but once the sheet interactions weaken and become 
close in magnitude to helical interactions, the transition 
locations tend to higher protein coverage, where eventu- 
ally 5i ^ 52- A transition to majority helical proteins in 
aggregates occurs only when sheet protein interactions 
are weaker than attractive, helical protein interactions, 
that is si < S2 with S2 > 1. Additionally, the number of 
sheet proteins in filaments, (^1), is maximal at high val- 
ues of 5i and large protein coverage, which decreases in 
every direction from this region. Meanwhile, the number 
of helical proteins in filaments, (^2), is maximal at low 
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values of 5i, and high protein coverage, and decreases in 
every direction from this region. 

In Fig. m^b), the locations of equal parts helix and 
sheet proteins is mainly independent of (j) and occurs at 
S2 ^ 2.5 for high 0, that is si ~ 52- As (/) decreases, the 
locations of helix/sheet transitions occur when si > S2 
with 52 slowly decreasing. The transition to majority he- 
lical proteins in aggregates occurs for 52 > si, with the 
locations of transitions occurring at smaller protein cov- 
erage as S2 increases. Sheet-coil/helix transitions occur 
at progressively higher protein coverage for increasing S2 
and disappear when 52 ^ 2.5, that is once helical in- 
teractions become stronger than sheet interactions. Like 
in Fig. IHl^a), when the sheet and helical interactions are 
attractive, sheet proteins in aggregates dominate at high 
protein coverage when si > S2, and helical proteins dom- 
inate at high (f) when si < 82- 

VII. DISCUSSION AND CONCLUSIONS 

We have found for the 2-state models with attractive 
interactions between sheet proteins two regimes: small, 
largely unstructured aggregates at low protein concen- 
trations, and long sheet dominated filaments at high pro- 
tein concentrations. The transition from one regime to 
the other is largely concentration driven, but with the 
inclusion of nuclei at low concentrations, we found in 
Fig. IH^a-b) and (e-f) that fewer filaments form as the 
size of the nuclei increases. At high concentration, the 
number of proteins in filaments, and those in the fila- 
ments that are sheet, are largely independent of ric. We 
also proposed in addition to the ID model for aggrega- 
tion, a quasi- ID model that more realistically captures 
the nucleation process, where the nuclei structure con- 
tains at least two layers of protein that is perpendicular 
to the direction of propagation of the aggregate, thus 
the nuclei is a quasi- ID structure. We found that when 
the interactions between different layers was strongly at- 
tractive, the quasi- ID model yielded essentially the same 
results as the ID model for the number of sheet pro- 
teins in aggregates. When using the same fit parameters, 
the number of aggregates showed a strong dependence 
on F, where increasing F suppressed the number of ag- 
gregates in both strip models, but more significantly for 
two-filament model when compared to the quasi- ID nu- 
clei model. 

We tested the predictions with the 2-state (coil-sheet) 
ID model, where the fraction of sheet proteins in aggre- 
gates, {0)/{Np)^ was used to compare to the experimen- 
tal results of A/3(l-40) using the strip model for fibrils, 
and the results of Curli fibrils using the ID model for 
fibrils. Fits of both data sets yielded very good agree- 
ment. Each of these proteins aggregate into amyloid 
fibrils through potentially different pathways, thus our 
model could potentially be applied to a wide variety of 
pathways in which amyloid fibrils are formed at different 
concentrations. 



For the 3-state model, we found transitions between 
three regions: sheet dominated regions when helical con- 
formation interactions are weak, helical dominated re- 
gions when sheet conformation interactions are weak, and 
coil aggregates dominate when helical and sheet confor- 
mation interactions are weak. In reality, for protein fibrils 
only the first of the three cases is experimentally relevant. 
Our model results primarily differ from those of the re- 
cent WSME model for aggregationi^, which is a peptide 
bond based model, since it does not consider interac- 
tions between helix and coil proteins, only interactions 
between sheet proteins. By using Potts models in a grand 
canonical ensemble, our approach to aggregation is quite 
general and could allow the possibility for helix and coil 
proteins to participate in aggregation. The Potts model 
has the advantage over other simpler models for aggre- 
gation because it allows for more conformational states 
to be considered for proteins, a feature which may prove 
useful as future experiments involving these characteris- 
tics become accessible. 

In conclusion, we have developed statistical mechan- 
ical approaches to describe the aggregation of proteins 
into fibrils in equilibrium. Protein folding and aggrega- 
tion involve a large number of degrees of freedom, thus 
it is important to make simplifications when possible. 
The ID and quasi- ID statistical mechanical models pro- 
posed here have a few parameters and are exactly solv- 
able. For some peptides responsible for neurodegener- 
ative diseases, such as A/3, it is not yet clear whether 
small oligomers and nuclei are thermodynamically sta- 
ble, but here we assumed that assemblies from nuclei to 
fibrils are thermodynamically stable. Calculated thermo- 
dynamic quantities mimic certain measurable properties 
of amyloid fibrils, such as the number of aggregates, the 
number of sheet segments, and the average lengths of fil- 
aments and sheet segments. In order to further test our 
models, experiments such as AFM measurements of fibril 
lengths, as was done by van Raaij^- et al., CD spectra 
of the sheet content at different concentrations, like in 
the Terzi datas^, and also the ThT experiments as in the 
work of Hammer et al^, should be carried out for various 
protein species. Additionally, proteins that are known to 
exhibit more than just 2-state folding ought to be fur- 
ther studied. The 3-state model presented here has the 
power to capture a more complicated aggregation phe- 
nomena where conformations such as helix (and others) 
may play a role when protein monomers join larger aggre- 
gates. With more experimental data, we will be able to 
draw effectively quantitative comparisons between pro- 
teins that aggregate and compile a table of parameters 
based on our model. 
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